Model Selection

Mathematical Reasoning Reinforcement Learning

# Mathematical Reasoning Reinforcement Learning

Nano Aha Moment 3b

A 3-billion-parameter language model trained with reinforcement learning for solving mathematical reasoning tasks, especially countdown games.

Large Language Model

OREAL-32B-SFT is a supervised fine-tuned model based on Qwen2.5-32B, specifically designed for mathematical reasoning tasks, serving as the initial policy model for the OREA reinforcement learning framework.

Large Language Model

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase